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ABSTRACT 


With the continuous development of remote sensing technology and satellite | 
communication technology, the higher is the resolution of remote sensing 
images, and the richer is information contained in the image. How to obtain 
information from remote sensing images quickly and accurately is the key 
problem in the application of remote sensing technology. This paper 
introduces the automatic detection of ship targets in visible light remote 
sensing images, analyzes the methods and problems of target detection and 
semantic segmentation based on deep learning, which improves the accuracy 
of ship detection. Finally, this paper summarizes the short comings of the 
existing methods and prospects of the future research trend. 
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I. INTRODUCTION 

Remote sensing images [1] are captured the real image of the 
earth's surface by remote sensing satellites. Through remote 
sensing images, people can intuitively understand the 
ground buildings, road conditions, natural environment, and 
other information. According to the types of sensors, remote 
sensing images can be divided into synthetic aperture radar 
(SAR) image, laser radar image, thermal infrared image, 
optical sensor (including visible light sensor) image, etc. 
Optical remote sensing mainly refers to the sensors 
operating in the visible and, that is, 0.38-0.76 micron. Visible 
images are intuitive and easy to understand with high spatial 
resolution. When the light and sunny weather conditions are 
well, high-resolution visible images can provide lots of 
information, obvious target structure characteristics. In the 
aspect of remote sensing specific target detection, the visible 
image has incomparable advantages such as SAR image and 
multispectral image [2]. Therefore, visible remote sensing 
images have been paid more and more attention when 
researchers extract the content of interest, the detection and 
recognition of specific targets from remote sensing images. 


In recent years, with the continuous development of remote 
sensing technology and satellite communication technology, 
the higher is the resolution of remote sensing images and the 
richer is information contained in the image. How to obtain 
information from remote sensing images quickly and 
accurately is a key problem in the application of remote 
sensing technology. The target detection of remote sensing 
image mainly aims at the given remote sensing image, and 
automatically identify the interesting objects in the image, 
such as vehicle, aircraft, ship, port, airport, etc. As an 
important part of remote sensing, ship detection plays an 
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important role in the military and civil fields such as 
maritime security, maritime traffic, border control, fishery 
management, etc. China is rich in marine resources. Ship 
detection based on remote sensing images is widely used in 
military and civil fields. For example, in a specific sea area or 
port, remote sensing technology can realize the monitoring 
and management of illegal fishing and smuggling ships. Ships 
may be in distress and loss of contact occur from time to 
time in bad weather conditions, while remote sensing 
technology can quickly and accurately detect the location of 
ships in distress, which is conducive to the rescue work[3]. 
In recent years, with the rapid development of optical 
remote sensing imaging technology, "Gaojing-1", 
"worldview-4", "GaoFen-6" and other optical remote sensing 
satellites have been launched one after another, which can 
continuously provide massive high-resolution images and 
bring new opportunities and challenges to the development 
of ship detection technology. Therefore, ship detection of 
remote sensing images has important research significance 
and broad application prospects. 


The traditional target detection algorithm is mainly based on 
image processing to extract features manually. This method 
is not accurate when processing complex images. The 
vigorous development of deep learning technology provides 
a very effective tool for remote sensing image target 
detection. Although the target detection technology based on 
deep learning has achieved remarkable success in natural 
images, it is difficult to directly apply this method to remote 
sensing images because of the significant difference between 
remote sensing images and natural scene images. Compared 
with the traditional image, the remote sensing image has the 
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characteristics of a large amount of data, complex 
background, small target size, and so on. At the same time, 
because the remote sensing image is a top view, different 
projection directions make the same target has different 
rotation angles, which makes the angle direction of the 
target arbitrary. Due to the particularity of remote sensing 
image, the current mainstream general target detection 
method based on deep learning cannot play an optimal role 
in remote sensing image target detection. Therefore, it is of 
great significance to study how to apply deep learning 
methods to target detection in satellite visible light remote 
sensing images to effectively improve the accuracy of target 
detection and reduce the detection time. 


II. Related work on target detection 

Target detection [4] is a basic research task in computer 
vision, which is mainly used to identify the types and 
locations of targets in images. Target detection technology is 
widely used in industrial product detection, airport security 
inspection, robot navigation, video monitoring, target 
tracking, remote sensing, and many other fields. Through 
using computer technology to replace the traditional manual 
detection, it is an important practical significance for 
improving production efficiency and reducing production 
cost. Therefore, target detection has become a hot topic in 
the academic and industrial circles this year. At the same 
time, as an important branch of computer vision, target 
detection technology also plays an important role in the 
following target tracking, semantic segmentation [5,6], 
instance segmentation [7]. 


The traditional target detection algorithm is mainly based on 
SIFT [8] and HOG [9] to extract features manually. In 2012, 
Alexnet [10] made a breakthrough in the ILSVRCimage 
classification competition, and researchers began to use the 
deep convolution neural network method to solve computer 
vision problems. At present, the common network structures 
include Google net [11], VGG [12], Darknet [13], ResNet [14], 
DenseNet [15], etc., which are designed to improve the 
accuracy and speed of feature extraction from different 
aspects. At present, the general target detection methods 
based on convolutional neural networks can be divided into 
two categories: the one-stage detection method and the two- 
stage detection method. In the two-stage detection method, 
candidate regions are generated from the image firstly, and 
then the candidate regions are classified. The main methods 
are R-CNN, Fast R-CNN, FasterR-CNN [16].One-stage 
detection methods mainly include YOLO, YOLO9000, 
YOLOv3[17], SSD[18], etc. 


Il. Ship detection methods 

With the development of remote sensing technology, people 
can understand the earth from a better perspective. In recent 
years, with the improvement of remote sensing image 
resolution, ship detection in optical remote sensing images is 
a research hotspot, and many different methods have been 
proposed. Generally speaking, these methods can be divided 
into two categories: traditional methods and deep learning 
methods. The traditional ship detection methods are divided 
into three steps: the first step is the separation of land and 
sea, which aims at separating the target from the ocean, the 
second step is candidate detection, which shows that pixels 
represent possible ships, and the last step is classification, 
which identifies one of all the detected candidates in the 


previous step of the real target. Traditional methods are 
mainly based on geometric features of images. 


Yang et al [19] proposed a detection method based on sea 
surface analysis and used a newlinear function combining 
pixel and region features to select candidate ships. Shi et al 
[20] extended the existing HOG features and some additional 
information to improve the ability of ship identification. Qi et 
al [21] used visual saliency to extract candidate regions and 
improved the hand feature based on HOG. Traditional 
detection methods mainly use manual features to detect 
ships. These methods have achieved good results in open 
waters, but it is difficult to detect in complex backgrounds. 
With the development of deep learning, the methods based 
on convolutional neural networks have surpassed the 
traditional methods in many computer vision tasks. Inspired 
by the successful application of convolutional neural 
networks in natural scenes, many detection methods based 
on deep learning have been applied to ship detection in 
remote sensing images. To improve the detection accuracy, 
Huang et al [22] proposed a squeeze excitation hop 
connection path network (SESPNets) using a path-level hop 
connection structure to improve the feature extraction 
ability. Zhang et al [23] improved the original Fast R-CNN 
structure to improve the detection results of small-scale 
distributed ships. You et al [24] proposed a DCNN 
framework to deal with multi-scale ship detection. You et al. 
[25] proposed an end-to-end method called scene mask R- 
CNN to reduce onshore false-positives. These methods based 
on CNN improve the efficiency and detection accuracy and 
liberate researchers from the tedious feature processing. 
However, these methods can only generate horizontal 
boundary boxes, which are not suitable for targets placed in 
any direction in remote sensing images. 


Due to the great differences between remote sensing images 
and natural scenery images, it is still difficult to apply it 
directly to remote sensing images. Different from natural 
images, remote sensing images are taken with an aerial view. 
Targets may exist in any direction. When detecting density 
targets, especially with a high length-width ratio, ordinary 
targets often are missed in the NMS step by relying on the 
horizontal bounding box method. In contrast, the rotating 
bounding box with an arbitrary rotation angle is more 
suitable for ship detection. To solve this problem, 
researchers began to introduce a rotating bounding box, 
which was initially used for text detection. Yang et al. [26] 
proposed a framework called rotation dense feature pyramid 
network (R-DFPN), which can effectively detect ships in 
different scenes such as ocean and port. Ma et al. [27] 
proposed a two-stage CNN ship detection method based on 
ship center and azimuth prediction, which can accurately 
detect ships in any direction in optical remote sensing 
images. Tian et al. [28] proposed a framework integrating 
multi-scale feature fusion network, rotating region 
recommendation network, and context pool, which realized 
accurate positioning and false alarm suppression. Zhang et 
al. [29] proposed a rotating area recommendation network 
(R2PN), which uses the azimuth information of ships to 
generate multi-directional suggestions. At present, most of 
the methods are two-stage. These methods can predict the 
direction of the target and improve detection accuracy. 
However, due to the existence of RPN network, these 
methods use complex network structures and affect the 
detection velocity. 





@IJTSRD | Unique Paper ID -IJTSRD35823 | 


Volume - 5 | Issue - 1 


| November-December 2020 Page 56 


International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470 


IV. Semantic segmentation methods 

The semantic segmentation combines image classification, 
target detection, and image segmentation. Image semantic 
segmentation is to classify every pixel in the image. By 
semantic segmentation of remote sensing images, we can 
separate the specific semantic pixels and obtain boundary 
information to improve the accuracy of target detection. 
Image semantic segmentation methods include traditional 
methods and methods based on convolution neural 
networks. The traditional semantic segmentation methods 
mainly include threshold-based segmentation, edge-based 
segmentation, region-based segmentation, and so on. 


A. Image segmentation methods based on threshold 
The threshold-based segmentation is to calculate one or 
more gray thresholds based on the gray characteristics of the 
image, compare the gray value of each pixel with the 
threshold, and finally classify the target into appropriate 
categories according to the pixel comparison results. 


The advantages of threshold-based segmentation are simple, 
high efficiency, and speed. The global threshold can 
effectively segment different targets and backgrounds witha 
great difference in the gray level. Local threshold or dynamic 
threshold is more suitable for the targets with little 
difference. Although the threshold-based segmentation 
method is simple and efficient, it also has some limitations. 
The method only considers the gray value of the pixel itself, 
generally does not consider the spatial characteristics, so itis 
very sensitive to noise. In practical application, the 
threshold-based methods are usually combined with other 
methods. 


B. Image segmentation methods based on edge 

The edge refers to the set of continuous pixels on the 
boundary of two different regions in the image, which 
reflects the discontinuity of local features and the mutation 
of image characteristics such as gray, color, texture, etc. The 
edge-based segmentation method detects the edge according 
to the gray value and divides the image into different parts. It 
is based on the observation that the gray value of the edge 
will show a step change. 


C. Image segmentation technology based on Region 
The image is divided into different regions according to the 
similarity criterion. It mainly uses the local space 
information of the image and can avoid the defect of small 
segmentation space brought by other algorithms. However, 
this kind of segmentation method is slow in large area 
segmentation and has poor anti-noise performance, which 
often results in meaningless region segmentation or over- 
segmentation of image. In general, it will be combined with 
other methods to give full play to their advantages to obtain 
a better segmentation effect. 


With the rapid development of deep learning, it is found that 
image semantic segmentation based on deep learning can 
greatly improve accuracy. Most of the traditional 
segmentation methods consider the visual information of the 
image pixel itself without training. Although the time 
complexity is not high, the accuracy is low, and it cannot 
effectively process the scene with complex backgrounds. The 
primary difference between the semantic segmentation 
method based on convolutional neural networks and the 
traditional semantic segmentation method is that the 


network can automatically learn the features of the image 
and carry out end-to-end classification learning, which 
greatly improves the accuracy of semantic segmentation. The 
main idea of the image semantic segmentation method based 
on deep learning is that a large number of original image 
data are directly inputted into the deep network without 
artificial design features. According to the designed deep 
network algorithm, the image data is processed complexly to 
obtain high-level abstract features. The output is no longera 
simple classification category or target positioning but the 
segmentation image with pixel category label. 


Long et al. [30] proposed a semantic segmentation model of 
a full convolution network. Firstly, the features were 
extracted by convolution operation, and then the feature 
map was up sampled and restored to the input size. The 
result of up sampling is fuzzy. It is not sensitive to the details 
of the image, and the segmentation accuracy is poor. Vijay et 
al. [31] proposed the SegNet symmetric semantic 
segmentation model, which can reduce the loss of pixel 
information, improve the resolution, and accurately locate 
the image segmentation boundary. Chen et al. [32] proposed 
the Deeplab model and added CRF based on FCN to improve 
the boundary segmentation accuracy. Ronneberger et al. [33] 
proposed a U-net network to improve the detection accuracy 
by adding feature fusion. When the convolution network is 
used for semantic segmentation, the input image usually 
needs to be adjusted to the input size of the model for the 
fixed-size model. After semantic segmentation, it is restored 
to the original size. Because the modification of image size 
will lose useful information, especially for the small target, it 
increases the difficulty of target segmentation. 


V. Conclusion 

Due to the great difference between remote sensing image 
and natural scenery image, it is still difficult to apply it 
directly to the remote sensing image. Different from natural 
images, remote sensing images are taken from an aerial 
view. Targets in remote sensing images may exist in any 
direction. When detecting densely packed objects, especially 
targets with a large aspect ratio, ordinary targets oftenare 
missed targets in the NMS step by relying on the horizontal 
bounding box method, as shown in Figure 1. The method 
retains the boundary box A with the maximum probability, 
removes other boundary boxes B and C, which results in the 
omission of the real target. 





a . » 


Figure 1falsedetection caused by NMS 


When the full convolution network is used for semantic 
segmentation, the input image usually needs to be adjusted 
to the input size of the model for a fixed size model. After 
semantic segmentation, it is restored to the original size. The 
process of semantic segmentation is shown in Figure 2. 
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Figure 2 semantic segmentation flow based on 
convolutional neural network 


For example, the above network limits the input image to 
480 * 320. Because the modification of image resolution will 
lose useful information in the image, especially for small 
targets. The size in the feature map is very small after down 
sampling, and it is difficult to include all the information 
about the target, which makes it more difficult to segment 
the target. For targets with complex boundaries, the 
detection results are restored to the original image. And the 
boundary is not smooth enough, and it is not sensitive to the 
details of the image. 


At present, the methods based on deep learning in remote 
sensing target detection achieve some achievements, which 
are mainly from the improvement of the target detection 
algorithm in natural scenes. There are significant differences 
between remote sensing images and natural scene images, 
especially in the aspects of target rotation, scale change, and 
complex background. Although researchers have explored 
and studied the ship target detection and recognition 
technology in optical remote sensing image, there are still 
many problems and challenges in ship target detection 
algorithm based on remote sensing image 


1. The environment in optical remote sensing image is 
complex 

Due to the transmission characteristics of optical sensors, 

camera angles, light intensity, weather, and sea background, 

and other factors, the target will be blocked, or incomplete 

and the contrast is low, which makes it more difficult to 

detect the target. 


2. Detection of inshore ships 

Different from the ship on the sea, because the inshore ship 
is often connected with the coast, which has similar 
grayscale and texture of the buildings on the shore and is 
affected by the shadow and side-by-side berthing. Thus, the 
conventional ship detection method cannot achieve the ideal 
detection results for the inshore ship. It is also difficult to 
extract the ship target quickly and accurately from the 
complex port environment. 


3. Real-time performance of ship detection algorithm 
At present, some target detection algorithms have a high 
recognition rate. But because of its high complexity and slow 
detection speed, it cannot meet the real-time requirements. 
With the development of deep learning technology, remote 
sensing image target detection technology is still an open 
problem to be further studied. 
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